Abstract: Microarray technology is one of the significant biotechnological means that allows recording the expression levels of thousands of genes concurrently within a quantity of different samples. Among the large amount of genes presented in gene expression data, only little fraction of them is efficient for performing a certain diagnostic test. So implement feature subset selection approach to reduce dimensionality, removing irrelevant data and increase diagnosis accuracy which is able to cluster genes based on their interdependence so as to mine important patterns from the gene expression data using Spatial EM algorithm. It can be used to calculate spatial mean and rank based scatter matrix to extract relevant patterns and further implement classification to diagnosis the diseases. A semi-supervised clustering is shown to be effective for identifying biologically important gene clusters with excellent predictive capability. The experiment results prove that Spatial EM based classification approach provides improved accuracy in diseases diagnosis.
Keywords: Microarray, Gene Expression, Spatial EM, Scatter Matrix, Disease diagnosis.